Skip to content

ClickHouse usage analytics: events/gauges tables with daily MV#3

Merged
lohanidamodar merged 110 commits into
mainfrom
claude/rebuild-analytics-clickhouse-OHWGZ
May 13, 2026
Merged

ClickHouse usage analytics: events/gauges tables with daily MV#3
lohanidamodar merged 110 commits into
mainfrom
claude/rebuild-analytics-clickhouse-OHWGZ

Conversation

@lohanidamodar
Copy link
Copy Markdown
Contributor

@lohanidamodar lohanidamodar commented Mar 14, 2026

Summary

Complete rewrite of the usage analytics library with a two-table architecture optimized for both real-time analytics and billing.

Architecture

  • Events table (MergeTree) — raw request events with dedicated columns for path, method, status, resource, resourceId, country (LowCardinality), userAgent
  • Gauges table (MergeTree) — simple resource snapshots (metric, value, time, tags)
  • Daily MV (SummingMergeTree) — pre-aggregates events by metric + tenant + day for fast billing
  • No periods — query-time aggregation via toStartOfHour/toStartOfDay instead of write-time fan-out
  • Single write path — collect(metric, value, type, tags) routes to correct table; event columns auto-extracted from tags

Key Changes

  • Two separate tables instead of one — events have 7 extra columns gauges don't need
  • Plain MergeTree for both tables — raw appends, query-time aggregation
  • Daily MV with minimal schema (metric, value, time, tenant)
  • LowCardinality(String) for country column
  • Bloom filter indexes on all filterable columns
  • String tenant — setTenant(?string)
  • Utopia Query for all read operations — parameterized queries, no SQL injection risk

API

Write

  • collect(metric, value, type, tags) — buffer with auto-flush
  • addBatch(metrics, type) — direct batch insert
  • flush() — write buffered metrics

Read

  • find(queries, type) / count(queries, type) / sum(queries, attr, type)
  • getTotal(metric, queries, type) — SUM for events, argMax for gauges
  • getTotalBatch(metrics, queries, type) — batch totals
  • getTimeSeries(metrics, interval, start, end, queries, zeroFill, type)

Billing (Daily MV)

  • findDaily(queries) / sumDaily(queries) / sumDailyBatch(metrics, queries)

Test Plan

  • Unit tests for Metric schema, getters, validation
  • Integration tests for ClickHouse and Database adapters
  • PHPStan level max passing
  • Linter passing
  • Security audit — no SQL injection vulnerabilities
  • CI green (CodeQL, Tests, Linter)

- Database adapter
- ClickHouse adapter
- Removed hardcoded column definitions in Usage class, replacing with dynamic schema derived from SQL adapter.
- Introduced new Query class for building ClickHouse queries with fluent interface.
- Added support for advanced query operations including find and count methods.
- Enhanced error handling and SQL injection prevention mechanisms.
- Created comprehensive usage guide for ClickHouse adapter.
- Added unit tests for Query class to ensure functionality and robustness.
- Maintained backward compatibility with existing methods while improving overall architecture.
…on, getTotal ambiguity

- Buffer key now includes tag hash so events with same metric but
  different tags (e.g. different paths) stay as separate entries
  instead of silently discarding the second call's tags

- Daily table queries (findDaily, sumDaily, sumDailyBatch) now validate
  attributes against the daily schema (metric, value, time, tenant)
  instead of the full event schema. Querying path/method/status on the
  daily table now throws immediately instead of causing a ClickHouse
  "No such column" runtime error

- Changed (int) cast to (float) for agg_value in getTimeSeries to avoid
  truncating fractional gauge values or large event sums

- getTotal() now throws when a metric exists in both event and gauge
  tables instead of silently adding incompatible aggregations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread src/Usage/Adapter/ClickHouse.php
Comment thread src/Usage/Adapter.php Outdated
Comment thread src/Usage/Metric.php
PHPStan level max flagged getTimeSeries() annotations as int while the
ClickHouse adapter emits floats via agg_value cast. Updates the abstract,
both adapters, the Usage facade, and zeroFillTimeSeries to float.

Also throws on json_encode failure in Usage::collect so the md5() input is
guaranteed string instead of string|false.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/Usage/Adapter/ClickHouse.php Outdated
Comment thread src/Usage/Adapter/ClickHouse.php
Comment thread src/Usage/Adapter/Database.php
…ts-only sum

- extractGroupByInterval: match by method string, not instanceof (parsed queries are base Query)
- flush(): selectively clear buffer on per-batch success (retry preserved on failure)
- collect(): use TYPE_EVENT constant instead of string literal
- addBatch(): require explicit \$type param (no default)
- sum(): events-only by default (summing gauges is meaningless)
- sumDaily*: document as events-only (daily MV has only events)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread src/Usage/Adapter/ClickHouse.php
Push the count cap down into the DB layer so callers that only need a
capped total (e.g. rendering "5000+") can stop ClickHouse early instead
of scanning the full filtered set. ClickHouse wraps the count in a
LIMIT-bounded subquery; Database delegates to utopia-php/database's
existing $max arg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/Usage/Adapter/ClickHouse.php Outdated
Adds keyset-pagination cursor support (cursorAfter / cursorBefore) to the
ClickHouse adapter via parseQueries. Cursor values accept Metric/ArrayObject
or plain associative arrays; an `id` tiebreaker is auto-appended to ORDER BY
so pagination is deterministic on non-unique columns. cursorBefore flips
direction at SQL build time and reverses results post-fetch.

Rejects two unsafe combinations: cursor + groupByInterval (no stable identity
on aggregated rows), and cursor + null type (paginating across events and
gauges has no coherent ordering).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/Usage/Adapter/Database.php Outdated
Three small follow-ups based on review feedback on the audit PR's twin
implementation:

- Drop the always-true `!empty($orderAttributes)` guard inside the cursor
  branch. resolveCursorOrder() always appends an `id` tiebreaker, so the
  guard is dead code and was misleading.
- normalizeCursorRow now removes `$id` after copying it to `id`, so cursor
  state is no longer carrying both keys.
- Throw an explicit Exception when a cursor value is null. The previous
  path silently routed null `time` cursors through formatDateTime(null)
  which returns the current timestamp — a misconfigured cursor would
  filter on `time < now()` and produce wrong pages instead of failing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/Usage/Adapter/ClickHouse.php Outdated
Comment thread src/Usage/Adapter/ClickHouse.php Outdated
Adds notEqual, notContains, notBetween, isNull, isNotNull, startsWith,
endsWith — keeping the supported Query method set in line with the audit
ClickHouse adapter. startsWith / endsWith use ClickHouse's built-in
functions of the same name; isNull / isNotNull emit `IS NULL` / `IS NOT
NULL` (no value binding); the rest follow the existing param-bound pattern.

The shared parseQueries logic is now consistent across both adapters:
- getParamType() centralises the column → ClickHouse-type mapping (time
  → DateTime64(3), value → Int64, default → String). Previously each case
  had an inline `if (\$attribute === 'time')` branch.
- formatTypedValue() routes DateTime-typed values through formatDateTime
  and everything else through formatParamValue, so each case has one code
  path.
- buildCursorWhere() uses the same dispatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/Usage/Adapter/ClickHouse.php Outdated
Comment thread src/Usage/Adapter/ClickHouse.php
Comment thread src/Usage/Adapter/ClickHouse.php Outdated
…e strictness

- ClickHouse purge() now also deletes from the daily aggregated table when
  purging events. Materialized views are forward-only, so purges on the
  source table left stale daily rows behind. Daily delete is skipped if any
  query references an event-only column (path/method/etc).
- ClickHouse getTotalBatch() now raises when a metric appears in both the
  event and gauge tables under $type=null, matching the existing safeguard
  in getTotal(). Mixing SUM (events) with argMax (gauges) silently produced
  meaningless totals.
- Usage::setNamespace/setTenant/setSharedTables now flush the buffer before
  changing adapter context. Buffered metrics carry no context, so changing
  it pre-flush would write them under the new context.
- Database adapter now stores a 'type' field per document and filters by it
  in find/count/purge/getTotal when $type is non-null. Previously the $type
  argument was accepted but ignored, returning rows of both kinds.
- composer.json: add 'test' script.
- .github/workflows: bump actions/checkout v3 -> v4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/Usage/Adapter/ClickHouse.php Outdated
Comment thread src/Usage/Adapter/ClickHouse.php Outdated
Mirrors the validator pattern in utopia-php/database
(Validator/Query/Filter.php): contains/notContains/equal/etc. queries
must have at least one value; an empty values array is rejected up front
with `<Method> queries require at least one value.` instead of silently
producing a "no filter applied" WHERE clause.

Without the guard, `Query::contains('metric', [])` would skip the IN
clause entirely and return all rows — exactly the opposite of the
intended IN () semantics, which should match nothing.

Applies the same VALUE_REQUIRED_METHODS allow-list and pre-switch check
that the audit adapter uses, so both libraries reject the same set of
empty-value filter methods consistently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/Usage/Adapter/Database.php
Comment thread src/Usage/Adapter/ClickHouse.php
@lohanidamodar lohanidamodar requested a review from loks0n May 5, 2026 07:37
… retry dedup, gauge order, cross-type validation, Database value check)
Comment thread src/Usage/Adapter/ClickHouse.php
Comment thread src/Usage/Adapter/ClickHouse.php
The Client::fetch(url:, method:, body:) call surface and METHOD_*
constants used by the ClickHouse adapter are unchanged between 0.5
and 1.1. Bumping to ^1.1 so the library is installable alongside
appwrite/server-ce 1.9.x, which now requires utopia-php/fetch ^1.1
(appwrite/appwrite#12252).
Comment thread src/Usage/Adapter/Database.php
Database adapter: convertQueriesToDatabase() previously dropped
TYPE_NOT_EQUAL, TYPE_NOT_BETWEEN, TYPE_NOT_CONTAINS, TYPE_STARTS_WITH,
and TYPE_ENDS_WITH silently — the switch had no case for them so the
WHERE fragment was skipped, turning a "not X" or "starts with Y"
filter into a full-collection match. Adds all five.

ClickHouse adapter: find()/count()/getTimeSeries() with $type=null
queried both events and gauges. When the query referenced an
event-only attribute like 'path', the gauge iteration would throw
"Invalid attribute name: path" via parseQueries(). Adds a private
queriesMatchType() helper that pre-checks each filter attribute
against the type's schema; skip the table when not satisfied. The
caller now gets the events side without the gauge crash, which is
what null-type semantics should mean.

sum() takes type=TYPE_EVENT as a hard default, no null-type path.
…-range scan efficiency

Previous ORDER BY of (tenant, id) had id (random UUID) as the primary
sort key, so ClickHouse stored rows in essentially random physical
order. Time-range predicates like WHERE time > X had to scan every
granule because the primary index had no time information to skip on.

Re-key to (tenant, metric, time, id) so the primary index matches
how the data is actually queried:
  - tenant: multi-tenant isolation (cheap first-level filter)
  - metric: per-metric series (most queries are scoped to one)
  - time: range scans now hit a small contiguous span instead of
    the whole table
  - id: tiebreaker for stable physical ordering

Gauges get the same shape. Daily MV already had the right key.

Drop the now-redundant bloom_filter indexes on metric and time
(primary key already covers them).

Pre-prod schema change — no migration path needed, just DROP+CREATE
on next deploy.

Updates MetricTest counts to match the trimmed index lists.
… Adapter

These were declared abstract on the base, forcing every implementation
to provide them even when the underlying backend has no multi-tenant
or namespace concept. No caller types against the abstract Adapter
to invoke them — every consumer goes through the Usage facade.

- Drop the three abstract method declarations from Adapter.
- Both ClickHouse and Database adapters keep their concrete impls
  (the methods are still needed for current usage).
- Facade now forwards via method_exists, so a future minimal adapter
  (no multi-tenancy, no namespacing) can extend Adapter without
  implementing dead stubs.
@lohanidamodar lohanidamodar merged commit 9bff4b7 into main May 13, 2026
4 checks passed
@lohanidamodar lohanidamodar deleted the claude/rebuild-analytics-clickhouse-OHWGZ branch May 14, 2026 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants